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5^ Abstract 
> 

, ^ , This article discusses a partially adapted particle filter for estimating the likelihood 

of nonlinear structural econometric state space models whose state transition density 
cannot be expressed in closed form. The filter generates the disturbances in the state 
^ transition equation and allows for multiple modes in the conditional disturbance dis- 

tribution. The particle filter produces an unbiased estimate of the likelihood and so 
can be used to carry out Bayesian inference in a particle Markov chain Monte Carlo 
framework. We show empirically that when the signal to noise ratio is high, the new 
filter can be much more efficient than the standard particle filter, in the sense that 
it requires far fewer particles to give the same accuracy. The new filter is applied to 

^SJ several simulated and real examples and in particular to a dynamic stochastic general 

equilibrium model. 

^ Keywords: DSGE model; Multi- modal; Partially adapted particle filter; State space 

model 
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1 Introduction 



For a general state space model the standard particle filter (Gordon et al. , 1993) gives an 



unbiased estimate of the likelihood. Andrieu et al. ( 2010[ ) show that it is possible to use 



this unbiased estimate within a Markov chain Monte Carlo (MCMC) sampling scheme to 
carry out Bayesian inference for the parameters of the state space model. They call such a 
sampling scheme particle MCMC (PMCMC). PMCMC is particularly useful for Bayesian 
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inference when the state space model is nonhnear or non- Gaussian so that the Kalman 
filter cannot be used. However, when the signal to noise ratio of the model is high, i.e., 
when the observation vector gives a very informative measurement on some combination(s) 
of elements of the state vector, the standard particle filter becomes a computationally 
inefficient importance sampler (see Pitt and Shephard (1999)). 

For many models, this problem can be solved by using adapted particle filters as in 
and Shephard (1999), which are more efficient as importance samplers. In addition. 



Pitt 



Pitt 



et al. (2012) show empirically that fully adapted particle filters used within PMCMC may 



require far fewer particles than the standard particle filter to achieve the same accuracy. 

While using adapted particle filters can be much more efficient than the standard 
particle filter, most adapted particle filters require that we can evaluate the state transition 
density. In important cases, this density is not easily available in closed form. This 
applies, in particular, to dynamic stochastic general equilibrium (DSGE) models, which 
are currently widely used in applied macroeconomics. This is in contrast to the standard 
particle filter which only requires that we can evaluate the observation density and simulate 
from the state transition density. 

Our article proposes to solve this problem by using a partially adapted particle filter that 
generates the states by first generating the disturbances in the state transition equation. 
The idea that a stochastic process may become more tractable when considered in terms 
of its innovations has a long history (see Heunis (2011)). We employ it here because it 
provides a simple and useful solution to the problem described above. This approach has 
also been considered in recent independent work by Murray et al. (2012), who use it to 
estimate biological models with intractable transition densities. Our approach differs in 
its emphasis on solutions tailored for structural econometrics. Specifically, we demonstrate 
that a proposal based on a numerical optimisation algorithm and allowing for multiple 
modes in the disturbances by using mixtures, delivers large efficiency gains when applied 
to rational-expectations models with high signal-to-noise ratios when compared to the 
standard particle filter and the filter in Murray et al. (2012). In Murray et al. (2012), the 
authors use a sigma-point approximation to the conditional densities of the disturbances. 

Two other possible improvements to the particle filter for rational expectations models 
have been proposed in recent literature. Amisano and Tristani (2010) demonstrate that 
the particle filter can be made more efficient if the proposal distribution for the state vector 
xt is conditioned on the first two moments of xt-i, estimated from the particle swarm in 
the previous period. Second, Andreasen (2011) demonstrates an improvement when the 



proposal density for each particle is based on a central-difference Kalman filter with a 
rescaled covariance. While these contributions are valuable, we believe that the algorithm 
discussed here represents something of an improvement on these methods, since it is able 
to deliver results using a relatively small number of particles. 

Section [2] describes the partially adapted particle filter used in our article and gives 
its properties. Section [3] demonstrates its performance on simulated data from a simple 
nonlinear time series model. Section |4] demonstrates its application to nonlinear DSGE 
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models: specifically, a neoclassical growth model and a consumption-based asset pricing 
model. Section [5] concludes. 



2 Auxiliary Disturbance Particle Filter 

This section discusses a particle filter which is effective for certain classes of models explored 
in this paper. This filter, which we call the auxiliary disturbance particle filter (ADPF), is 
similar in principle to the auxiliary particle filter but works by sampling the disturbances in 
the state equation rather than the states themselves. More specifically, while the auxiliary 
particle filter works by building an approximation to the joint density of {x^_i, xt)' , given 
the previous observations yi-.t-i = {yi, ■.■,yt-i} and yt, the ADPF builds an approximation 
to {x^_^,uty, given yi-t, where ut is the disturbance term in the state equation. To simplify 
the notation in this section, we often omit to show dependence on unknown parameters. 

2.1 General principles 

Consider the state space model with measurement density p{yt\xt) and state transition 
equation 

xt = h{xt-i,ut), (1) 

where ut is an independent sequence, h{xt-i,ut) is a nonlinear function of xt-i and 
Ut- Given a sample yi;T = {yii - ■ ■ ^yr}, and the unknown parameters, the likelihood 
is p{yi:T) = n^2?'(y*lyi:*-i)- When the function h{-) is nonlinear, it is usually im- 
possible to evaluate the likelihood exactly, but we can estimate it using one of a number of 



particle filters. The standard particle filter ( Gordon et al. 1993 ) can be used whenever it is 



possible to evaluate the measurement density p{yt\xt) and generate from the state transi- 
tion equation ([T]). However, the standard particle filter can be quite inefficient in the sense 
that it produces a likelihood estimate with a large variance compared to more sophisticated 



particle filters (see [Pitt et al. 2012). Pitt and Shephard (1999) suggest a class of auxiliary 



particle filters that can be much more efficient than the standard particle filter, but most 
of these require the evaluation of the density of the state transition equation. However, 
for many cases that are of interest to us, it is infeasible to evaluate the state transition 
equation density which means that existing auxiliary particle filters cannot be used. 

The ADPF attempts to overcome this problem by approximating the state space model 
with measurement density p{yt\xt) and state transition equation ([T]) as follows. Suppose 
that the density g{yt+i\xt) approximates p{yt+i\xt) and the density g{ut+i\yt+i-,xt) approx- 
imates the density p{ut+i\yt+i-,xt) and that we can evaluate p{yt+i\xt) and g{ut+i\yt+i-,xt) 
and generate from g{ut+i\yt+i-,xt). The choice of this approximate density may be model- 
specific, although we discuss a general implementation in Section [3. 2[ 

To explain intuitively how the ADPF is constructed, suppose that {(x^, vr^'^), /c = 1, . . . , N} 
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is a swarm of particles generated from an approximation to p{xt\yi t), so that has asso- 

N 



ciated weight vr^. Now, define 



p{dxt\yi;t) = ^iT^S^k{dxt) 



k=l 



where 6s{dx) is the Dirac delta distribution centered at s. We shall show how to construct 
p{dxt+i\yi:t+i)- 

Suppose we wish to estimate K{m{xt+i)\yi:t+i) , where m(-) is a function of xt+i, as- 
suming that the expectation exists. Define the functional 



Jt+i(m) = / m{xt+i)p{yt+i\xt,ut+i)p{xt,ut+i\yi:t)dxtdut+i, 
m{xt+i)p{yt+i\xt,ut+i)p{ut+i)p{xt\yi:t)dxtdut+i, 
m{xt+i)p{yt+i\xt)p{ut+i\yt+i,xt)p{xt\yi:t)dxtdut+i, (2) 



using the identity p{yt+i\xt,ut+i)p{ut+i) = p{yt+i\xt)p{ut+i\yt+i, xt). Then, it is straight- 
forward to check that E(m(xt+i)|?/i:t+i) = Jt+i{m) / Jt+i{l) , where 1 is the unit function, 
and p{yt+i\yi:t) = Jt+ii^)- 

We approximate Jt+i{m) by replacing p{xt\yi:t)dt in ^ by p{dxt\yi;t) to obtain, 

Jt+i{m) M j m{xt+i)p{yt+i\h{xt;ut+i))p{ut+i)p{xt\yi:t)dut+idxt 

f , Myt+i\h{xt;ut+i))p{ut+i) , I . , I ^, I . 

= / m{xt+i)—, . . — -g{yt+i\xt)g{ut+i\yt+i,xt)p{dxt\yi:t)dut+i 

J g{ut+i\yt+i,xt)g{yt+i\xt) 

i \ f f .P{yt+i\h{xt,ut+i)p{ut+i) \- u \ \^ 

= / .^t\t+i / rnlxt+i)—, . :— . — rg{ut+i\yt+i,xt)gN{dxt\yi:t+i)dut+i 

^ JJ g{ut+i\yt+i,xt)g{yt+i\xt) 

where 



gN{dxt\yi:t+i) = ^T^t\t+i^x^idxt),Tr^it+i = ^jv*'*^,^ ^^^^ ^t\t+i = 9{yt+i\xt)Tr^ . 

Suppose that u^^^ ~ g{ut+i\yt+i,x^) and x^^-^ = h{x^,u^^i), and put 
k _ Piyt+i\4+i)piut+i) k _ ^t+i 



g{u^^^^\yt+i,x^,)g{yt+i\x^,) ' ^l+i 
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We obtain the estimate 

/ N \ / N \ N 



\i=l / \j=l / k=l 

With = (j^^'iit+i) ' (4) 



.1=1 / \i=l 



SO that 



E{m{xt+i)\yi:t+i) = X]"''(^t+i)^t+i P{yt+i\yi:t) = I ) Yj^W 

k=l \i=l / \i=l 



This suggests that wc take {{x'l^i,Tr^^i),k = 1, . . . , N} as the swarm of particles that 
we use to approximate p{xt+i\yi:t+i) and define 



TV 

p{dxt+i\yi:t+i) = J2^t+i^x'i+S'^^t) ■ 



k=l 



The following algorithm formally describes the ADPF and is initialized with a sample 
Xq ~ p{xo) with mass tTq = 1/N for A; = 1, N. 

Algorithm 1. For t = , .., T — 1 , given samples x^ ~ p{xt\yi:t) with mass vr^ for k = 
1,...,N. 

1. Fork = l: N, compute = g{yt+i\x'l)Tr!^ , ttL^ = ■ 

2. Fork = l: N, sample x^ ~ ^ili K\t+Aiidxt)- 

3. For k = 1 : N, sample u^^^ ~ g{ut+i\Xf; yt+i) and put Xf_^_i = h{x^; "Ut+i)- 

4. - For k = 1 : N, compute 

k __ Piyt+i\xt+i)pi'>^t+i) k _ ^t+i 



The estimate of the likelihood corresponding to the ADPF is 



t=l \i=l J \i=l J 
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By Andrieu et al. (2010)) we can write this estimate of the Hkehhood as Pn {yi:t\S , C) 



where C consists of a set canonical random variables that are used to construct the estimate 
and that have density PNiC)- Without loss of generality we can assume that elements of 
C are uniform independent random variates. The estimated likelihood also depends on the 
number of particles N. 

Theorem 1. The estimate PN{yi:T\C) unbiased in the sense that 



PN{yi:t\C)PNiC)dC = p{y 



l:T)- 



The proof is in Appendix\A\ 



3 Particle filter performance on a first order nonlinear au- 
toregressive model 

3.1 The model 

Consider the following univariate nonlinear time series model, 

yt = xt + asEt (6) 
xt = (pxt-i + (Tu [ut + Suf) (7) 

where £t and ut are iid standard normal random variables. We choose this model because 
it is one of the simplest nonlinear extensions to a well-understood linear model. When cxe 
is small it is also one of the simplest examples of the class of structural economic models 
that we consider below. The parameter 5 controls the degree of nonlinearity in the model; 
with (5 = the model is a first order autoregressive (AR(1)) model with observation noise. 
When 6 exceeds about 0.5, the behavior of the model becomes noticeably different to that 
of a linear model. For a comprehensive analysis of this general class of models, see Aruoba 
etaLlpoTTl). 



3.2 ADPF implementation and testing 

We choose a normal distribution with mean 4>xt-i + (Ju5 and variance cr^ + (T^(1 + 25^) 
for the approximating density g{yt\xt-i) because it matches the first two moments of 
Ut-, conditional on xt-i. To estimate g{ut\yt-,x'l_]), we proceed as follows. Let i^{u) = 
x^_i)(j){u',0, 1) for given x^_i and y^, where (j){u]a,b'^) is the univariate normal 
density in u with mean a and variance b^. We numerically maximise £''{u) over u, subject to 
equation ([t]), and initialise with a draw from 0, 2^), which is reasonable because u has 
density (j){u;0, 1). Let be the mode of £^{u). Then we obtain a normal approximation 
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, Aj) to i^{u), where Af-' = — (p^ {u) / dv?) ^ evaluated at u = u^. Appendix B 
gives more details on the optimization procedure used here, and in the next example. 

The normal approximation must be renormalised to adjust for the presence of multiple 
local modes. Multiple modes occur because the law of motion is a nonlinear polynomial, 
so that l'^{u) is also nonlinear; in this case, a quartic equation. In other words, a given 
observation yt might have been generated by more than one possible value of ut ■ For that 
reason, a simple normal approximation is inadequate. To see this, consider the limit as 
CTg — )• 0. In this case, the model becomes deterministic, and almost any realised level of yt 
is then consistent with two values of ut- Suppose, in a particular case, we label those two 
values u[^^ and u[^\ The algorithm, as described so far, would place a mass of 1 on the 
value of Ut that the numerical minimiser found; that is, if it happened to find the first mode, 
it would sample ut from g{ut\yt,Xt) = (/>(nt; u|"^\ a|"^^), where a|^^ = — (9^£^(m)/9u^) ^ 
evaluated at n = u[^\ 

Figure [T] illustrates this point. The left-hand panel plots function of ut, that 

is, equation Q. This plot is assumes a previous value xt-i = 0, and parameter values 
6 = 0.5, 0"^ = 0.2, (T„ = 1. Any value of the underlying state xt > —0.5 is consistent with 
two possible values of ut- For example, a value of xt = 0.5 could have been generated 
by Ut ~ 0.41 or ut ~ —2.4. The right-hand panel plots an unnormalised version of i^{u), 
conditional on an observation yt = 0.5. The distribution is obviously bimodal; with the 
noise variance cr^ small in this case, the two modes of £^{u) are close to 0.41 and —2.4. 

With (Te — 7- 0, the simple maximisation approach described above would place a weight 
of 1 on, say, the value of n = 0.41, ignoring the other possibility. This is not an accurate 
approximation to the true p{ut\yt,Xt)), and will produce what appears to be a biased 
estimate of the log-likelihood. Although the ADPF is unbiased no matter what proposal 
density g{ut\yt,Xi_i) is used, in cases with a high signal to noise ratio, a very large number 
of particles are required to counteract the problem described in the text, which defeats the 
purpose of using the filter. Instead, a better approximation is given by the mixture density 

g{ut\yt, x\) = </'(^xS'); 0, l)<P{uf, ufK^^^) + ^ 0, l)<P{uf, uf\/\f^). (8) 

In this simple model, it is feasible to search for both modes u^^^ and but we develop a 
more general approach, with the goal of constraining the required computation time as the 
dimension of the model increases. Given a set of estimates 5^ , obtained as described above, 
we form a proposal density for each disturbance = 1, . . . , A^, by taking an equally- 
weighted mixture of those that generate a value for yt within 3 standard deviations of 
the observed value. That is. 



N 

g{ul\yt,x\) oc 

i=l 



' ■ ■ \ 2 

yt - (\>x\_^ - Gu {u}^ + \ 



< 3^ 



<A(n^;2j,A«) (9) 
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where x [^] is the characteristic or indicator function of event cj. This takes advantage of 
the fact that values of x'l_i tend to be clustered, so that a value of found for a particular 
x^_-^ has a good chance of working well for another x^^-^, in the sense of implying a mean 
value of yt close to the observed one. This method of generating a proposal generalises to 
more complex models, such as the DSGE example discussed in Section |4] below. Note that 
the proposal density is reweighted by the true density in step 4 of the ADPF algorithm. 
Thus, there is no need to ensure that (for instance) the proposal density puts the correct 
mass on different possible values of u, as in equation ([s]). All that is required is that each 
possible value of u has a reasonable chance of being used in the proposal. 

Figure [T] about here 

We performed a simulation study to compare the performance of the ADPF with the 



standard SIR particle filter and the 'CUPFl' algorithm described in Murray et al. (2012). 



That algorithm has a similar structure to the ADPF, except that the first-stage proposal 
density g{yt\x^_i) and the second-stage density g{ui\yt,x^_i) are both given by the Un- 



scented Kalman Filter (Wan and van Der MerweH2000D , run individually for each particle. 
We briefly describe the Unscented Kalman Filter, with a full description given by |Wan 
and van Der Merwe (2000) and van Der Merwe et al. (2001). In the Unscented Kalman 



Filter the prior mean xt-i is propagated through the model's law of motion, along with a 
set of 'sigma points' xt-i ± ^^ XPt^i^.y where Pt-i is the prior state covariance augmented 
with the covariance of the disturbances and noise terms, A is a parameter of the algorithm, 
■ denotes a matrix square root, and is the i^h row of the matrix|^ After the sigma 
points are propagated through the law of motion and the observation equation, we obtain 
an accurate estimate of the mean and covariance of the state and disturbances, conditional 
on the observation yt- 

We evaluated the filters on four different parameter settings: with either 5 = 0.7 (a 
high nonlinearity case) or 5 = 0.1 (low nonlinearity) , and with either a"e = 0.01 (a high 
signal to noise (SNR) ratio) or cje = 1.0 (low SNR). In all cases, we set fj^ = 1 and (j) = 0.6. 

For each test, we simulated a single dataset of 50 observations. We chose this number 
of observations because it roughly corresponds to the length of a quarterly macroeconomic 
data series. Using 1000 replications, we calculated the median log-likelihood estimated by 
the filters on each dataset, along with the interquartile range of their estimates, as well as 
the standard deviation of their log-likelihood estimates. Additionally, we used a standard 
particle filter with 1,000,000 particles to estimate the true value of the loglikelihood, which 
we used to estimate the bias of the loglikelihood estimates of the other filters. 



^More precisely, A is a function of parameters q, /3 and k. We used the values suggested in Wan and 



van Der Merwe 



(20001, namely a = 10"'^, P = 2, 



0. We experimented with different values, but found 



that this did not appear to alter the resuhs substantially. 
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3.3 Results 



The performance of the ADPF is comparable to that of the standard SIR filter in the two 
cases with low single-to-noise ratio. In both scenarios, the standard deviation of the log- 
likelihood estimates from the ADPF with 50 particles is between those from the standard 
particle filter with 100 and 500 particles. This is similar to the performance of the fully 
adapted particle filter, which makes no substantial improvement on the standard particle 
filter when the signal to noise ratio is relatively low. 

The situation is different with a high SNR, reported in Tables [l] and [2} In these cases, 
the ADPF proposal draws from p{ut\yt, xt-i) are much more useful than the draws from the 
proposal p(xt I used by the standard particle filter, because the observation yt is highly 
informative about the current position of xt (and therefore of ut). As a result, the precision 
of the ADPF with 50 particles is similar to the standard particle filter's with more than 
7500 when the model is markedly nonlinear, and more than 15000 in the approximately 
linear case. Additionally, the CUPFl variation does not appear to be well adapted to this 
class of model, perhaps because more than the first two moments are required for a good 
approximation to the target density. 

Tables [T] and [2] about here. 

Tables |3] to [6] report the estimated bias and variance in the log-likelihood estimates 
for the four combinations of nonlinearity and signal to noise ratio. The asymptotic anal- 



ysis in Pitt et al. (2012) suggests that the log-likelihood estimates should have a bias 
approximately equal to —0.5 times their variance, and that the variance should decrease 
in proportion to the number of particles used. Our simulations of the simple quadratic 
AR(1) model are broadly consistent with these expectations, with the predictions borne 
out well in the low signal to noise ratio cases. The high signal to noise ratio cases reported 



in Tables [5] and |6] appear to be consistent with the predictions of Pitt et al. (2012) as the 



number of particles becomes large, though the uncertainty around these estimates is larger 
in these cases. 

Tables |3] to [6] about here. 

4 Parameter estimation 

4.1 Example 1: Neoclassical growth model 
4.1.1 Model 

This section considers a basic neoclassical growth model. We choose this model because 
it is a useful and simple benchmark for solving and estimating DSGEs, used for example 



in Schmitt-Grohe (2004) and Gomme (2011). The model is based on the decisions of a 
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representative household, which chooses between consumption ct and investment in next 
period's capital stock kt- The household's goal is to maximise discounted lifetime utility, 
given by 



t=o 

subject to a feasibility constraint, 

ct + kt = Atkf_^ + (1 - 5)h-i , 
where 6 G [0, 1], and a productivity shock 

log At = plogAt-i + et et ~ N{0,a^^) . 



(10) 



(11) 



where p G (0,1). The solution of the model consists of equations (10) and (11) plus a 
consumption Euler equation. 



(12) 



Here, EfX denotes the model-consistent expectation of X, conditional on information 
available at time t. 

This solution can be converted to a Markov process on the assumption of rational 
expectations as in Klein (2000). If the depreciation rate 6 is below one, the conversion 



cannot be expressed in closed form, and some type of approximation must be used. We 
chose a second-order approximation, using the methods described in Klein (2000) and 



Gomme (2011). The output of these methods is a law of motion for the vector xt 



{ct, kt, at) of the form 



xt = d + Ext-i + Fet + (I3 <^ x[_j) Gxt-i 
+ (I3 Het + (I3 e't) Jet. 



(13) 



The reduced-form coefficient matrices d, E, F, G, H and J are functions of the structural 
parameters, but must be calculated numerically as we do not have analytical expressions 
for them. 

As is standard in the DSGE literature, we add the 'measurement error' to the obser- 
vation equation, in order to avoid stochastic singularity and for computational convenience, 
i.e. 

yt = [1,0,0] xt + ut utr^N{0,al). (14) 

For linear DSGE models, ut is usually assumed to be small, with cr^ many orders of mag- 
nitude smaller than o"^. This assumption is sometimes relaxed for second-order estimation 
in order to reduce the sampling error of the standard particle filter. In some cases, such 
as the asset pricing model considered below, measurement error is an important part of 
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the model, and the variance of the measurement noise is comparable to that of the in- 
novations in the model's law of motion. Here, we maintain the assumption of a high 
signal to noise ratio, setting af, to 10~®. We set the rest of the parameters to fairly stan- 



dard calibrated values (see e.g. Gomme, 2011 Schmitt-Grohe 2004). Specifically, we set 



13 = 0.99, a = 1/3, p = 0.8, 5 = 0.05 and a, 



0.02" 



4.1.2 Estimation 

To evaluate the performance of the ADPF in estimation, we simulated a data series of 



50 observations using equations (13) and (14). Again, we chose this length observations 



because it is of the same order of magnitude as a macroeconomic time-series. We then 



use the adaptive random walk Metropolis-Hastings (Roberts and Rosenthal, 2001) to take 



100,000 draws from the parameter vector. We fix (3 at 0.99. This is standard, as it is 
difficult to identify (5. The vector of unknown parameters is = {a, p,S^ae). Table [7] 
summarizes the priors on the structural parameters, which are set relatively loosely to 
assist identification. 

Table [7] about here. 

We initialised the Metropolis-Hastings chain at the maximum likelihood estimate ob- 
tained from a first-order approximation of the model via the Kalman filter. We chose this 
initialisation method because we observed that the standard deviation of the log of the 
estimate of the likelihood obtained by the ADPF increased significantly in some areas of 
the support of 9 away from the true values, making it difficult for the Metropolis-Hastings 
algorithm to converge. Additionally, initialising the MCMC chain in this way has been 
the practice for second-order DSGE estimation using the standard particle filter, as in 



Fernandez- Villaverde and Rubio- Ramirez (2008) and Amisano and Tristani (2010). The 



Metropolis-Hastings proposal covariance matrix was initialised to a diagonal matrix of 
small positive values, with adaptation beginning after 100 draws. 

We repeated this procedure using different numbers of particles in the particle filter 
and the ADPF (using the same simulated data). For each estimation run, we report the 
Metropolis-Hastings acceptance rate, the inefficiency, and the computation time. For each 
component of the parameter vector, the inefficiency is calculated as IF = 1 + '2'Ylj'=iPji 
where pj is the estimated autocorrelation of the parameter iterates at lag j. K is the 
sample size used to compute pj, then the maximum lag length is set to L* = minjlOOO, L}, 
with L being the lowest index j such that \pj\ < 2l\fK. 

Since the actual wall-clock estimation time depends heavily on the details of a particular 
implementation, we report instead a measure of computation time calculated as the number 
of evaluations of the model's law of motion required to produce one effectively independent 
draw from a given parameter. Thus, if N particles use the law of motion an average of 
k times each in an MCMC run with an inefficiency (as described above) given by IF, 
then the computation time is taken to be CT = k x N x IF. Pitt et al. (2012) measure 
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the computation time as x IF. Here, we ensure a fair comparison with the standard 
particle filter by penalising the ADPF for the multiple function evaluations required for 
optimising i^{ut). We estimate k by keeping a tally of the number of times the law of 
motion subroutine was called during estimation. For this model, the value of k was around 
16 per particle per observation. 

For the 



In implementing the ADPF, we use the same approach as in section 3.2 



first-stage proposal density g(yt^i\xt), we use a normal distribution matching the first two 



moments of yt+i conditional on a:^, which can be calculated from equations (13) and (14) 



Specifically, substituting (13) into (14), dropping the negligible measurement error term 
fj, and taking expectations, the mean of yt conditional on xt-i is 



fJ'y,t = di+ EiXt-l + X^_-^^GiXt-l + (T^ Ji , 



(15) 



where di and Ei are the first element and row of d and E, Gi is the upper 3x3 blocks of 
G, and Ji is the first element of J. Similarly, the conditional variance of yt is 



a. 



rr2 p2 



2alFix't_^Hi + al 



4-1 



(16) 



where F\ is the first element of -F, fl\ is the first row of i^, and J\ is the first row of J. 
Note that if we use a first-order approximation to the solution of the DSGE model, then 
G 



H = J 



0, and the mean (15) and variance (16) are equal to the one-step prediction 
mean and variance from the Kalman filter (conditioned on a given value for Xt-i). 



4.1.3 Results and Analysis 

As expected, the estimated parameter values from all filters are very similar. However, there 
are notable differences in efficiency and computing time. Table [8] reports the Metropolis- 
Hastings acceptance rates for different numbers of particles used in the standard particle 
filter and the ADPF. It also shows the inefficiencies for each component of the parameter 
vector. As the number of particles used increases, so that the estimates of the loglikelihood 
become more precise, the acceptance rate increases. This is true for both the standard 
particle filter and the ADPF. Broadly speaking, the ADPF performs about as well with 50 
particles as does the standard particle filter with several thousand particles. Conversely, 
to approach the performance of the ADPF with 300 particles, the standard particle filter 
must use about 10,000. 

The differences in inefficiency are also reflected in the estimates of computing time, 
which are reported in Table [5| As explained above, the estimates of computing time are a 
function both of the number of computations required to generate a given number of draws 
of the parameter vector, and also of the inefficiency of those draws. While the inefficiencies 
decrease steadily as the number of particles was increased, the computing time requires a 
tradeoff between higher numbers of particles (which reduces inefficiency) and lower numbers 
of particles (which directly reduces computing time) . The optimal computing times occur 
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with around 1,500 particles for the standard particle filter and 30 for the ADPF. The 
computing time of the ADPF is roughly one fifth of the standard particle filter. Since 
the implementation of the ADPF leaves scope for optimisation or parallel computing, this 
relative performance could be improved further in practicej^ 

Table [5] also shows the variance of each filter's loglikelihood estimates, evaluated by 
taking 75 repeated loglikelihood estimates at the true value of 6. These results are broadly 



consistent with the asymptotic analysis in Pitt et al. (2012), which suggests that the optimal 



computing time would be attained when the loglikelihood variance is around 0.81. In the 
case of the ADPF, the optimal computing times occur when the variance is slightly higher 



than that. We conjecture that this is because the analysis in Pitt et al. (2012) assumes 



that a perfect Metropolis-Hastings proposal distribution is available. 

Tables [8] and [H about here. 



4.2 Example 2: Asset pricing with habits 
4.2.1 The Model 

This section demonstrates the full-information estimation of a structural asset pricing 



model, specifically, a consumption-based model with external habits ( Campbell and Cochrane 



1999). Previously, this type of model has been taken to the data by matching moments. 



e.g. 



approximation, e.g. 



Campbell and Cochrane (1999)), using GMM , e.g. Hyde and Sherif (2005), or a linear 



Bouakez et al. 



( |2005 )). Here, we estimate the likehhood directly. 



The model assumes that the representative agent's consumption process is 

A log Ct = 5 + f t , 
where u ~ N{0,a'^). The agent's utility function is given by 



(17) 



Ut 



{Ct - Xt 



i=0 



7 



(18) 



where Xt is the (external) habit stock, interpreted as the minimum level of consumption 
required to maintain a well-defined utility (i.e., the household must ensure that Ct > Xt). 
The surplus consumption ratio St and the deviation of log St from its mean S are defined 
by 



St = — and St = log St - log S 

'^t 

■^The optimisation step of the APDF algorithm can be performed in parallel for larger problems. Ad- 
ditionally, we penalise the ADPF for every evaluation of the law of motion; but, conditional on x^_^, only 



half of equation ( 13 1 needs to be recalculated for a given value of et. 
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The law of motion of st is assumed to be 

St = (t>St-l + 



(19) 



where the distmbance vt is the same as the consumption innovation in equation (17), and 
the steady-state level of St is 



S 



a 



7 



1 



(20) 



See Campbell and Cochrane ( 1999 ) for discussion of the derivation of equations ( 19 ) and 
(20). The ratio St is stationary, since the level of the habit stock Xt is constructed to 
grow at the same rate as Ct in the long run. The nonlinear functional form for means 
that habit is a slowly-moving average of recent consumption in 'normal times' (when the 
habit stock is close to its mean) but responds more sensitively to consumption innovations 
during 'bad times' (when 'st is low). This nonlinearity allows the model to address both the 



equity premium puzzle and the risk-free rate puzzle (see Campbell and Cochrane (1999) 
for further discussion). 

On that basis, it can be shown that the equilibrium price-dividend ratio of a financial 
asset satisfies 



Dt 



exp [7(st - st+i) + {l-'y){g + ut+i)] 1 + 



Pi 



t+i 



D 



t+i 



(21) 



where /3f is the intertemporal discount factor in period t. This is the Fundamental Theorem 
of Finance — the current asset price equals the risk-neutral expectation of next period's asset 
price and return — using the functional forms implied by equatic 



(18 


) to ( 


20 


) ( 


Campbell 



and Cochrane 1999). Since not all changes in the typical household's intertemporal trade- 
offs can be explained by consumption habits, we choose to perturb this parameter with a 
shock process that is a random walk in logs, 



bt = ht-i + et 



(22) 



where et ~ iV(0, a1). 

Equations ( 17), ^19[ ), (21) and (22) characterise the mode . The observed variables are 



AlogCt and Alog-^, and the observation equations are (jl7) and (21). We approximate 
equation (21) using a third-order Taylor series expansion m s' at s = because it is the 
simplest approximation that describes the nonlinear behaviour of the price-dividend ratio 



adequately. Thus equation (21) is approximated as 



log ^ = Fo + Tist + F2Sf + Fs?? , 



(23) 



where the Fj coefficients are functions of the structural parameters (see Appendix IC]) 
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4.2.2 Estimation 



We apply the model to observations of growth in the S&PSOO price-dividend ratio and US 
consumption using quarterly observations from 1950 to 2011, a total of 248 datapoints, that 
are plotted in Figure [2} The S&P500 series is from Shiller pOOG^ , while the consumption 
series is the seasonally adjusted real personal consumption expenditure series from the 
Bureau of Economic Analysis (series code PCECC96). 

Figure [2] about here 



Since it is unlikely that consumption is observed perfectly, we modify equation (17) to 
include an iid noise term Tjt ~ N{0,a'^). The restrictions placed on the joint distribution 
of consumption and asset price growth by the structural model allow us to identify this 
noise term separately from the consumption innovation I'f We use a fairly tight prior 
distribution to constrain the variance of rjt, because the structural model would otherwise 
struggle to improve on the assumption that consumption and price-dividend ratio growth 
are both iid. This exercise is intended to illustrate the ADPF estimation method, rather 
than provide a precise explanation of intertemporal saving decisions, and while the addition 
of external habits greatly improves the consumption-based asset pricing model, its chief 
virtue is still its simplicity, rather than its flexibility. 



Table 10 gives the prior distributions of the other parameters. The priors are chosen to 
ensure that they imply reasonable properties for the risk-free interest rate and consumption. 
Specifically, we select priors on g and to ensure that the underlying consumption growth 



series is close in mean and variance to the observed one. More importantly, Campbell and 



Cochrane ( 1999 ) show that the implied level of the risk-free rate is 



log /3 + 75 



V2 



(24) 



Instead of placing a prior on (3 directly, we choose a prior for that ensures it is a low 
positive number. Finally, we choose relatively loose priors for 7 and <j). Table 10 summarises 
the prior distributions of the parameters, which we assume are mutually independent. 

Table [To] about here. 

To estimate the model, we took 25,000 adaptive Metropolis Hastings draws, with the 
proposal covariance matrix initialised to a small diagonal matrix, and adaptation beginning 
after 100 draws. Unlike the previous example, the posterior mode located using the Kalman 
filter and a first-order approximation to the model was not particularly close to the posterior 
mode using a higher-order approximation. For that reason, we initialised the chain at values 



close to the calibrations described in Campbell and Cochrane (1999). In implementing the 



ADPF, we used the same approach as in previous examples. In all cases, we discarded the 
first 1000 MCMC draws. 
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4.2.3 Results and Analysis 



Table 11 reports the posterior means and standard deviations (in brackets) of the pa- 



rameters. Table 12 reports the acceptance rates an and the inefficiency estimates of the 
parameters. The table shows that the Metropolis-Hastings acceptance rates generally in- 
crease and the inefficiencies generally decrease with a higher number of particles for both 
the standard particle filter and the ADPF. Broadly speaking, the performance of the ADPF 
with a given number of particles appears to be comparable to that of the standard particle 
ffiter with 15 or 20 times more particles. 



Table 13 presents the computing times, calculated in the same manner as described 
above. Here, a clearer difference emerges between the standard particle ffiter and the 
ADPF. The estimated computing times of the ADPF are roughly twice as fast as the 
standard particle ffiter's. In calculating these times, we penalised the use of the analytical 
derivative of equation (23) equally as heavily as using the law of motion itself. If these 
analytical derivatives are discounted — as may happen in applications where the derivatives 
are considerably simpler than the transition equations — the performance of the ADPF is 
around 5 times better than the standard particle ffiter's. 



Tables 11 12 and 13 about here. 



Notably, the posterior mean of r-^ is unchanged from its prior mean. However, despite 
the apparently weak identification of this parameter, the nonlinear estimation reveals some 
amount of information about it. Figure [s] is a scatterplot showing the values of /3 implied 
by the draws of against the draws of (j). (The graph shows 1500 randomly-selected 
draws from the MCMC chain for the ADPF with 50 particles.) Perhaps surprisingly, a 
more persistent habit stock (higher cp) is associated with more value placed on the future 
(higher /3). This is in fact consistent with the relationship implied by equation (21). To 



see this, substitute (19) into (21) and ignore shocks after period t, 



oc/3exp [7(l-0)st] 1 + 



Pt- 



D 



t+i 



showing that a rise in /3 is, other things equal, associated with a fall in (1 

Figure [3] about here 



5 Conclusion 

The ffiter discussed in this paper offers an attractive alternative to the standard parti- 
cle ffiter for estimating nonlinear structural models. Broadly speaking, in comparison to 
the standard filter, the ADPF requires much lower computing times for a given level of 
estimation accuracy. 
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Appendices 



Appendix A Proofs 



The proof can be derived from Del Moral (2004) (Section 7.4.2, Proposition 7.4.1). How- 
ever, we believe that it is easier to follow the proof of Theorem 1 in Pitt et al. (2012) and 
we do so here. Let Aj = {(x^, vr^), k = 1, . . . , N} be the swarm of particles at time t. 



Proof of Theorem [7} 



N 



^ {pN{yt\yi:t-i)\^t-i) = ^v{y~ 



•t\xti)4-i 



k=l 
N 



^{pN{yt-h:t\Vl:t-h-l)\^t-h-l) = '^P{yt-h:t\x\-h-l)'^t- 



h-1 



k=l 
N 



E{pN{yi:t)\Ao) = ^p(yi:t|xg)4 

k=l 

^imiyi-.t)) =p{yi:t) 



(25) 

(26) 

(27) 
(28) 



Equations (25) and (26) are obtained as in Lemmas 6 and 7 respectively of Pitt et al. 



(2012). Equation (27) is obtained by taking h = t — 1 in ( |26[ ) and (28) is obtained from 
(27[) because Aq = {{xQ,7r^),k = 1,...,N} with Xq ~p(xo) and ttq = 1/A^. □ 



Appendix B Optimisation Method 



While the optimization of i{Uf\yt, x^_-^) can be performed in many ways, we find that the 



Levenberg-Marquardt algorithm ( Marquardt , 1963), as implemented by More et al. (1980), 
delivers satisfactory results. This algorithm is applied sequentially to each particle at each 



time step. By default, it uses numerical differentiation to estimate g 



di 
dui 



and A 



duiduj 



In the case of the asset pricing model, analytical derivatives are easy to calculate and are 
used instead. At each step of the iteration, the proposed value of u is 



-1 

A + uI] g. 



The value of u is initialised at 10, then increased by a factor of 10 if the proposed value is 
rejected, and decreased by a factor of 10 if the proposed value is accepted. The algorithm 
is deemed to have converged if ||^|| < 10~'^, where ||^|| is the Euclidean norm, or if the sum 
of squared residuals (scaled by their standard deviations) is less than 10~^. The algorithm 
is terminated after a maximum of 10 iterations. 
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In each case, each component of is mitiahsed with a draw from A^(0, 2) (recah that 
each u variate is, by construction, standard normal). We made this choice to ensure that 
the algorithm explores the tails of the distribution with reasonable probability. 



Appendix C Asset Pricing Approximation 

Taking a third-order Taylor approximation of equation ( |21[ ) , then evaluating the coefficients 
with St = 'st+i = ht = vt+i = 0, gives the following values for the Fj coefficients in equation 
(23), where G = exp((/), where g is the growth rate from equation (17): 



G^ -^G' 

(1 - <t^)G^+^M_ 
[G^ - l3G){G"i - <p(5Gy 

1^ ^ {l-<^){G^-^G)^{G^ + (fPG) 
2 ° ^ l3G{G"i - (l)^G) 

1 (1 - 0)3G''+i^73(g2t + 2;50GT+i + 2^(/)2G^+i + 'p^cfy^G'^) 
6 (G^ - ;SG)(G-> - (A^G)(G^ - (/.2;5G)(G^ - ' 
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Figure 1: New state xt as a function of shock ut (left) and log-posterior of ut (right) for a 
realisation of the quadratic AR(1) model 
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Figure 2: Time series plots of P/D growth and C growth used for the asset pricing model 
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Figure 3: Bivariate scatter-plot of the MCMC draws of /3 and cj) for the asset pricing model 



particles 


Median loglikelihood 


IQR Median std. dev. 


Standard Particle Filter 


100 


-3241.7 


2876.8 


4028.3 


500 


-285.8 


588.7 


847.8 


1000 


-118.8 


500.9 


334.4 


2000 


-80.0 


206.9 


104.1 


5000 


-69.6 


35.1 


21.7 


7500 


-68.3 


37.7 


9.0 


15000 


-67.7 


6.4 


2.3 


CUPFl 


50 


-7587.2 


3203.8 


6278.5 


100 


-3255.3 


1881.4 


4197.5 


150 


-1924.8 


1278.6 


3048.9 


Auxiliary Disturbance Particle Filter 


50 


-67.2 


0.15 


0.51 



Table 1: Quadratic AR(1) model. Low nonlinearity, high signal to noise ratio 
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N particles 


Median loglikelihood 


IQR Median std. dev. 


Standard Particle Filter 


100 


-2062.7 


3722.3 


3251.4 


500 


-122.5 


545.8 


240.4 


1000 


-56.0 


98.5 


53.8 


2000 


-40.3 


49.9 


11.7 


5000 


-37.3 


4.8 


2.5 


7500 


-36.8 


1.84 


1.55 


15000 


-36.5 


0.13 


0.91 


CUPFl 


50 


-7915.0 


12404 


9644.9 


100 


-2011.1 


6099.1 


3618.4 


150 


-960.3 


3168.2 


2149.3 


Auxiliary Disturbance Particle Filter 


50 


-38.2 


0.3 


1.2 



Table 2: Quadratic AR(1) model. High nonlinearity, high signal to noise ratio. 



A'' particles 


Loglikelihood variance Loj 


^likelihood bias 


Standard Particle Filter 


100 


0.3824 (.3816,.3837) 


-0.190 


500 


0.0809 (0.0807,0.0812) 


-0.028 


1000 


0.0403 (0.0402,0.0404) 


-0.001 


2000 


0.0211 (0.0211,0.0212) 


-0.006 


5000 


0.0081 (0.00808,0.00813) 


0.010 


7500 


0.0052 (0.00519,0.00522) 


0.007 


15000 


0.0027 (0.00269,0.00271) 


0.010 


CUPFl 


50 


0.8731 (0.8712,0.8761) 


-0.416 


100 


0.3639 (0.3631,0.3652) 


-0.196 


150 


0.2869 (0.2863,0.2879) 


-0.121 


Auxiliary Disturbance Particle Filter 


50 


0.1076 (0.1074,0.1080) 


-0.117 



Table 3: Quadratic AR(1) model. Low nonlinearity, low 
signal to noise ratio. The 95% confidence intervals for the 
estimated variances are in brackets. 
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N particles 



Loglikelihood variance Loglikelihood bias 



Standard Particle Filter 



100 


14.99 (14.96,15.04) 


-2.90 


500 


0.8199 (0.8181,0.8228) 


-0.30 


1000 


0.3026 (0.3020,0.3037) 


-0.13 


2000 


0.1478 (0.1475,0.1483) 


-0.05 


5000 


0.0554 (0.0553,0.0556) 


-0.001 


7500 


0.0354 (0.0353,0.0355) 


-0.008 


15000 


0.0168 (0.0168,0.0169) 


0.006 


CUPFl 


50 


43.4 (43.3,43.6) 


-6.50 


100 


16.6 (16.6,16.7) 


-3.06 


150 


8.17 (8.16,8.20) 


-1.66 


Auxiliary Disturbance Particle Filter 


50 


0.623 (0.622,0.626) 


-0.57 



Table 4: Quadratic AR(1) model. High nonlinearity, low 
signal to noise ratio. 95% confidence intervals for the esti- 
mated variances are in brackets. 
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N particles 


Loglikelihood variance Loglikelihood bias 


Standard Particle Filter 


100 


1.62x10^ (1.61xl0^1.63xl0^) 


-4396.2 


500 


7.19x10^ (7.17xl0^7.21xl0^) 


-568.6 


1000 


1.12x10^ (1.116xl0^1. 122x10'') 


-198.6 


2000 


1.083x10^ (1.081xl0^1.087xl0•*) 


-53.6 


5000 


470.3 (469.2,471.9) 


-9.1 


7500 


81.8 (81.7,82.1) 


-3.8 


15000 


5.33 (5.32,5.35) 


-1.0 


CUPFl 


50 


3.94x10''' (3.93x10^,3.96x10'') 


-8611.5 


100 


1.762x10'^ (1.758x10^1.768x10^) 


-4538.7 


150 


9.30x10'^ (9.28xl0^9.33xl0'') 


-2972.2 


Auxiliary Disturbance Particle Filter 


50 


0.2607 (0.2601,0.2616) 


-0.05 



Table 5: Quadratic AR(1) model. Low nonlinearity, high 
signal to noise ratio. 95% confidence intervals for the estimated vari- 
ances are in brackets. 
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N particles 


Loglikelihood variance Loglikelihood bias 


Standard Particle Filter 


100 


1.057x10'' (1.055xl0^1.061xl0'') 


-3129.4 


500 


5.781x10^ (5.768xl0^5.801xl0'') 


-156.4 


1000 


2897.9 (2891.7,2908.0) 


-37.5 


2000 


135.7 (135.4,136.2) 


-7.8 


5000 


6.09 (6.07,6.11) 


-1.27 


7500 


2.394 (2.389,2.402) 


-0.57 


15000 


0.821 (0.820,0.824) 


-0.10 


CUPFl 


50 


9.30x10'' (9.28xl0^9.33xl0■^) 


1.10x10"^ 


100 


1.309x10^ (1.306xl0^1.314xl0^) 


-3194.6 


150 


4.62x10^ (4.61xl0^4.64xl0®) 


-1653.8 


Auxiliary Disturbance Particle Filter 


50 


1.522 (1.519,1.527) 


-1.90 



Table 6: Quadratic AR(1) model. High nonlinearity, high 
signal to noise ratio. 95% confidence intervals for the estimated vari- 
ances are in brackets. 



Parameter 


Distribution 


Mean 


Std. dev. 


P 


Beta 


0.8 


0.1 


a 


Beta 


0.333 


0.015 


S{%) 


Gamma 


0.5 


0.07 




Gamma 


0.01 


0.01 



Table 7: Priors for the structural parameters for the neoclassical growth model. 
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N particles 


Acceptance rate 




Inefficiencies 








a 


<5 


P 




Standard Particle Filter 


500 


2.1 


769.7 


580.4 


491.2 


404.1 


1500 


12.3 


43.4 


87.0 


52.7 


56.2 


5000 


21.8 


18.3 


19.5 


20.4 


18.7 


10000 


24.9 


16.8 


14.5 


18.0 


16.5 


Auxiliary Disturbance Particle Filt 


or 




30 


15.2 


41.1 


27.5 


31.2 


31.8 


50 


19.1 


25.0 


22.1 


21.3 


23.4 


75 


21.0 


23.3 


20.3 


20.5 


20.4 


100 


23.3 


18.0 


17.5 


16.4 


16.7 


150 


24.6 


22.5 


16.1 


17.5 


18.8 


200 


25.1 


19.2 


16.0 


17.0 


16.8 


300 


25.9 


15.7 


13.5 


14.5 


16.9 



Table 8: Metropolis-Hastings acceptance rates and inefficiencies of the parameter estimates 
for the growth model. 
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N particles 


Loglikelihood variance 


Computing 


time / lOe^ 






a 


S 


P 




Standard Particle Filter 


500 


24.03 


192.4 


145.1 


122.8 


101.0 


1500 


2.03 


32.5 


65.3 


39.5 


42.2 


5000 


0.68 


45.8 


48.7 


51.1 


46.7 


10000 


0.28 


84.0 


72.3 


90.2 


82.3 




Auxiliary Disturbance 


Particle 


Filter 






30 


1.72 


9.9 


6.6 


7.5 


7.6 


50 


1.06 


10.0 


8.8 


8.5 


9.4 


75 


0.67 


14.0 


12.2 


12.3 


12.2 


100 


0.39 


14.4 


14.0 


13.1 


13.4 


150 


0.43 


27.0 


19.3 


21.0 


22.6 


200 


0.25 


30.8 


25.6 


27.1 


26.9 


300 


0.15 


37.6 


32.5 


34.8 


40.5 



Table 9: Loglikelihood variances and computing times for the neo- 
classical growth model. The variances of the loglikelihoods are calcu- 
lated at the true parameter values, a = 1/3, p = 0.8, 5 = 0.05 and 
(T^ = 0.02^. The value of A; = 16 is used in calculating computing 
times. 



Parameter 


Distribution 


Mean 


Std. dev. 




Gamma 


2 


0.5 


I (%) 


Gamma 


1.9 


0.15 


rf (%) 


Normal 


1.0 


0.01 




Beta 


0.8 


0.1 


a, (%) 


Gamma 


0.8 


0.03 


ar, (%) 


Gamma 


0.1 


0.03 


(Te (%) 


Gamma 


5 


0.7 



Table 10: Priors for the structural parameters in the asset pric- 
ing model. Values for g and and their standard deviations 
are in annualised percentage terms. The prior distribution of 
is truncated to have positive support. 
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N particles 


Posterior Mean (Std. Dev. in brackets) 




9{%) 7 ^ r-f {%) a, (%) cr, (%) a, {%) 


Standard Particle Filter 


200 
500 
1000 


2.9 (0.28) 0.93 (0.356) 0.97 (0.007) 1.0 (0.01) 7.4 (0.43) 0.8 (0.07) 0.8 (0.05) 

2.8 (0.20) 0.87 (0.334) 0.97 (0.008) 1.0 (0.01) 7.5 (0.50) 0.8 (0.06) 0.8 (0.04) 

2.9 (0.22) 0.94 (0.399) 0.97 (0.009) 1.0 (0.01) 7.3 (0.48) 0.8 (0.08) 0.8 (0.03) 


Auxiliary Disturbance Particle Filter 


30 
50 


2.9 (0.19) 0.86 (0.309) 0.97 (0.007) 1.0 (0.01) 7.4 (0.46) 0.8 (0.06) 0.8 (0.02) 
2.8 (0.21) 0.85 (0.309) 0.97 (0.008) 1.0 (0.01) 7.4 (0.43) 0.8 (0.06) 0.8 (0.03) 



Table 11: Parameter estimates for the asset pricing model with standard errors in brackets. Values for 
g and r-^ and their standard deviations are in annualised percentage terms. 



A'^ particles 


Acc. rate 


Inefficiencies 










g 7 (f) Ge 






Standard Particle Filter 


200 


5.0 


115.8 195.9 444.5 129.0 477.6 


359.9 


128.7 


500 


11.4 


59.7 63.1 58.7 63.1 48.9 


52.9 


34.3 


1000 


13.9 


39.6 42.7 48.3 42.6 57.8 


30.5 


55.5 


Auxiliary Disturbance Particle Filter 


30 


11.4 


52.2 38.0 50.0 112.5 49.3 


51.9 


54.9 


50 


14.1 


36.5 85.0 98.7 58.2 48.4 


35.7 


47.9 



Table 12: MCMC parameter inefficiencies for the asset pricing model. 



N particles 


Computing Times 




9 1 <j) Ge CFr, 


Standard Particle Filter 


200 
500 
1000 


57.4 97.2 220.5 64.0 236.9 178.5 63.8 
74.0 78.2 72.8 78.2 60.6 65.6 42.5 
98.3 105.8 119.8 105.6 143.2 75.6 137.7 


Auxiliary Disturbance Particle Filter 


30 
50 


27.6 20.1 26.4 59.4 26.0 27.4 29.0 
32.1 74.8 86.9 51.3 42.6 31.4 42.2 



Table 13: MCMC computing times for the parameters of the asset 
pricing model using the factor A; = 7.1 for the ADPF. 
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